Hypothesis:
Experimental setup & sequencing:
Data analysis: Data was analysed using edgeR and limma packages, both available through Bioconductor using R (version 3.4.4, Someone to Lean On). Two additional key packages used were ggplot2 and data.table.
Table 1. Data set characteristics.
| Characteristic | Value |
|---|---|
| Samples (n): | 19 |
| Groups (n): | 5 |
| Unique ENSEMBL IDs (n): | 32545 |
Fig 1. Density of log-CPM values pre -and post filtering
Fig 1. Figure reports the density of log-CPM for every sample (by color) pre -and post filtering of genes with low expression. The raw read count matrix is filtered based on log-CPM values. Vertical dashed line represents the cut-off (log-CPM=0, CPM=1). The figure shows a distinct shift of the density from below the threshold (Fig 1A) to above the threshold (Fig 1B). Approximately 1/3 of the genes remain post filtering.
Fig 2. Distribution of log-CPM values pre -and post normalization
Fig 2. Figure reports the distribution of gene expression (log-CPM) for each sample. Fig 2A reports the distribution prior to normalization while Fig 2B reports the distribution following normalization of library sizes using the TMM algorithm. Boxplots are based on all log-CPM values while points represent a random sample of 1e4 genes (due to processing time issues). The difference in the distribution of log-CPM using original and effective library sizes is minor but adjusted for.
Fig 3. Variance explained by principal components based on the 500 genes with highest variance
Fig 3. Figure reports the proportion variance explained by each principal component. Fig 3A reports the proportional variance explained by each component while Fig 3B reports the cumulative variance explained by the components.
Table 2. Upper and lower bounds (bootstrapped 95 % confidence intervals) for the proportion of variance explained by principal component 1 to 10
| PC1 | PC2 | PC3 | PC4 | PC5 | PC6 | PC7 | PC8 | PC9 | PC10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| Upper bound: | 0.674 | 0.146 | 0.032 | 0.020 | 0.010 | 0.009 | 0.008 | 0.006 | 0.007 | 0.005 |
| Lower bound: | 0.711 | 0.176 | 0.040 | 0.026 | 0.013 | 0.012 | 0.011 | 0.008 | 0.008 | 0.007 |
Fig 5. Dimensionality reduction using PCA and t-SNE of the 500 genes with highest variance
Fig 5. Figure reports the samples in low dimensional space following dimensionality reduction with PCA and t-SNE for the 500 genes with the highest variance. Ellipses are added to the samples for easier recognition of the study groups.
Fig 9. Hierarchical clustering of samples using 500 most variable genes
Fig 9. Figure reports hierarchical clustering based on the 500 most variable genes. Fig 9A reports the clustering using a dendrogram while Fig 9B reports the same clustering using a circular packing plot.
Fig 10. Clustering of samples using common methods following dimensionality reduction with PCA and t-SNE
Fig 10. Figure reports result of common clustering algorithms implemented on samples following dimensionality reduction using PCA and t-SNE. In the case train-test split was required a 60:40 ratio (13:9 samples) was used instead of a traditional 80:20 ratio due to low number of samples in the test set.
Fig 12. Top ten loadings (absolute value) for the first two principal components
Fig 12. Figure reports the top 10 positive and negative loadings for the first -and second principal component.
Fig 15. Mean-variance relationship pre -and post voom transformation
Fig 15. Figure reports the mean-variance relationship pre -and post application of the voom function. Fig 15A reports the average log-CPM against the quarter root of the variance. Fig 15B reports average log-CPM against the \(log_2(st.dev)\). Blue line reports the average \(log_2(st.dev)\). The red line is a linear trend fitted to the black dots. Each black dot represents a gene. Fig 15A illustrates that the variance is decresing when the average expression is increasing. In Fig 15B the dependency is removed and the mean variance is unchanged when the average expression increases.
Fig 16. Hierarchical clustering and circular packing plot of 1000 genes with highest F-value
Fig 16. Figure reports hierarchical clustering of samples based on the 1000 genes with highest F-values. Fig 16A reports a dendrogram while Fig 16B reports a circular packing plot.
Fig 17. Number of differentially expressed genes for each contrast (FDR<0.05)
Fig 17. Figure A reports the number of DGE genes between contrasts Naive[2w] vs invitro, SCI[2w] vs invitro, Naive[2w] vs SCI[2w]. Figure B reports the number of DGE genes between contrasts Naive[2w] vs invitro, SCI[2w] vs invitro and Naive[2w] vs SCI[2w].
Fig 18. Number of differentially expressed genes for each contrast (FDR<0.05)
Fig 18. Figure reports the number of DGE genes within and between contrasts Naive[2w]vsSCI[2w] and Naive[3w]vsSCI[3w].
Table 3. Number of differentially over -and under-expressed genes for each contrast (FDR<0.05)
| Invitro vs Naive[2w] | Invitro vs Naive[3w] | Invitro vs SCI[2w] | Invitro vs SCI[3w] | |
|---|---|---|---|---|
| Downregulated: | 4862 | 4852 | 2411 | 3763 |
| No change | 5236 | 5733 | 9952 | 7574 |
| Upregulated: | 5013 | 4526 | 2748 | 3774 |
| Sum: | 15111 | 15111 | 15111 | 15111 |
| Naive[2w] vs Naive[3w] | Naive[2w] vs SCI[2w] | Naive[2w] vs SCI[3w] | Naive[3w] vs SCI[2w] | Naive[3w] vs SCI[3w] | SCI[2w] vs SCI[3w] | |
|---|---|---|---|---|---|---|
| Downregulated: | 677 | 4401 | 3685 | 3980 | 3203 | 2952 |
| No change | 13994 | 6009 | 7734 | 6441 | 8666 | 9876 |
| Upregulated: | 440 | 4701 | 3692 | 4690 | 3242 | 2283 |
| Sum: | 15111 | 15111 | 15111 | 15111 | 15111 | 15111 |
Fig 18. Mean difference -and volcano plot (FDR<0.05)
Fig 18. Right figure reports a mean-difference plot which illustrates the number of over -and under expressed genes. Threshold is set at \(log_2(fold change)\) +/-1 (blue lines). Blue dots represents genes above or below the log-fold change thresholds while red dots represent those genes which are above/below the thresholds and are significantly (p<1e-6) differentially expressed. Left figure is a volcano plot which reports the number of significantly (p<1e-6) over -and underexpressed genes (marked with red). Blue dots represent genes which have logFC <-1 or >1 but are not significantly expressed. Figure A&B are for contrast Naive[2w]vsSCI[2w], Figure C&D are for contrast Naive[3w]vsSCI[3w], figure E&F are for contrast Naive[2w]vsNaive[3w] and figure G&H for contrast SCI[2w]vsSCI[3w].
Table 4. 10 most significantly up -and downregulated differentially expressed genes (FDR<0.05) Contrast: Naive[2w] vs SCI[2w]
| Gene | log2(fold change) | P-value (adjusted) | Gene | log2(fold change) | P-value (adjusted) |
|---|---|---|---|---|---|
| Hapln2 | 8.19 | 1.6e-15 | Gpr37l1 | -5.12 | 3.8e-14 |
| Ptgds | 8.07 | 8.4e-15 | Fads2 | -5.12 | 3.8e-14 |
| Gpr37 | 6.26 | 8.4e-15 | Tubb2b | -4.96 | 8.5e-14 |
| Mal | 9.72 | 3.8e-14 | Limd1 | -4.04 | 8.5e-14 |
| Ldlrap1 | 6.68 | 3.8e-14 | Spry2 | -3.47 | 1.1e-13 |
| Opalin | 9.77 | 4.5e-14 | Gadd45g | -3.89 | 1.4e-13 |
| Tmem63a | 4.23 | 8.1e-14 | Gpm6a | -6.44 | 1.5e-13 |
| Sept4 | 5.54 | 8.5e-14 | Gpr17 | -7.71 | 2.4e-13 |
| Ndrg1 | 5.36 | 8.5e-14 | Cdh2 | -4.37 | 2.9e-13 |
| Dse | 3.82 | 1.1e-13 | Pmel | -6.35 | 2.9e-13 |
Contrast: Naive[3w] vs SCI[3w]
| Gene | log2(fold change) | P-value (adjusted) | Gene | log2(fold change) | P-value (adjusted) |
|---|---|---|---|---|---|
| Hcn2 | 5.07 | 1.5e-13 | Gadd45g | -4.86 | 5.6e-15 |
| Gpr37 | 4.97 | 1.8e-13 | Limd1 | -3.51 | 6.3e-13 |
| Tmem229a | 4.86 | 6.3e-13 | Spry2 | -3.05 | 6.3e-13 |
| LOC361016 | 5.40 | 6.3e-13 | Mdm2 | -3.44 | 1.0e-12 |
| Nkd1 | 4.42 | 6.3e-13 | Slc35e3 | -3.16 | 1.6e-12 |
| Prex2 | 6.20 | 6.3e-13 | Tgfb2 | -5.93 | 3.0e-12 |
| Fbln2 | 6.66 | 6.3e-13 | Pfkfb3 | -2.72 | 7.2e-12 |
| Hapln2 | 5.07 | 7.8e-13 | Shmt2 | -3.40 | 1.2e-11 |
| Ptgds | 5.30 | 1.6e-12 | Os9 | -2.56 | 1.7e-11 |
| Tubb4a | 4.23 | 1.6e-12 | Lrig3 | -2.42 | 3.2e-11 |
Contrast: Naive[2w] vs Naive[3w]
| Gene | log2(fold change) | P-value (adjusted) | Gene | log2(fold change) | P-value (adjusted) |
|---|---|---|---|---|---|
| Chordc1 | 1.02 | 1.3e-03 | Cdh13 | -2.31 | 5.4e-07 |
| LOC102546648 | 1.39 | 1.4e-03 | Flrt1 | -3.37 | 3.1e-05 |
| Ets1 | 1.15 | 2.3e-03 | Rgs7 | -3.57 | 3.1e-05 |
| Zfp217 | 1.29 | 2.3e-03 | Gpr37l1 | -1.48 | 2.6e-04 |
| Gprasp2 | 1.20 | 2.5e-03 | Gpr17 | -2.59 | 2.6e-04 |
| Atp13a5 | 2.82 | 2.6e-03 | Fnd3c2 | -3.79 | 3.3e-04 |
| Myc | 1.09 | 3.7e-03 | Otof | -3.93 | 3.3e-04 |
| P4ha1 | 1.01 | 4.3e-03 | Gria4 | -1.91 | 5.8e-04 |
| LOC679811 | 1.12 | 4.3e-03 | Scn1b | -1.72 | 7.5e-04 |
| Fam46a | 1.13 | 4.4e-03 | Nlgn3 | -1.54 | 7.5e-04 |
Contrast: SCI[2w] vs SCI[3w]
| Gene | log2(fold change) | P-value (adjusted) | Gene | log2(fold change) | P-value (adjusted) |
|---|---|---|---|---|---|
| Fbln2 | 6.05 | 5.2e-11 | Cmklr1 | -3.72 | 5.2e-11 |
| Dhcr24 | 3.48 | 2.7e-10 | Dse | -3.08 | 5.2e-11 |
| Fdft1 | 2.72 | 3.8e-10 | Eng | -3.97 | 5.2e-11 |
| Hmgcs1 | 2.99 | 3.8e-10 | Pgghg | -3.90 | 1.1e-10 |
| Fbn2 | 4.66 | 8.8e-10 | Plin2 | -3.58 | 1.1e-10 |
| Aacs | 2.16 | 1.4e-09 | Nt5e | -3.58 | 1.5e-10 |
| Cyp51 | 2.27 | 1.6e-09 | Acsf2 | -3.07 | 1.6e-10 |
| Acat2 | 3.21 | 1.8e-09 | ENSRNOG00000046171 | -3.58 | 1.7e-10 |
| Fdps | 2.20 | 3.4e-09 | Plxdc2 | -3.13 | 1.9e-10 |
| Epn2 | 1.95 | 4.6e-09 | Lcp1 | -3.28 | 1.9e-10 |
Table 5. GO terms and KEGG pathways (FDR<0.05)